Re: Runaway Apache Process

am 27.01.2010 14:28:11 von Dan Bunyard

--001485f86e3af90429047e256014
Content-Type: text/plain; charset=UTF-8

I have never done a backtrace, can you please point me in the right
direction for that?

I didn't check CPU usage at the time, only load average which was around 100
(normally it's between 0.02 and 0.5 over 1 minute).

I was able to log in but it was VERY slow. As I watched the load average it
was continuing to climb just before I killed Apache. It did not terminate
gracefully either, the error_log showed this:
[Mon Jan 25 12:50:49 2010] [warn] child process 23437 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 23441 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 23445 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 23451 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 23453 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 28350 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 28355 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 26939 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 23437 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 23441 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 23445 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 23451 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 23453 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 28350 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 28355 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 26939 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 23437 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 23441 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 23445 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 23451 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 23453 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 28350 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 28355 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 26939 still did not exit,
sending a SIGTERM
[Mon Jan 25 12:50:55 2010] [error] child process 23437 still did not exit,
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 23441 still did not exit,
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 23445 still did not exit,
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 23451 still did not exit,
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 23453 still did not exit,
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 28350 still did not exit,
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 28355 still did not exit,
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 26939 still did not exit,
sending a SIGKILL
[Mon Jan 25 12:50:56 2010] [notice] caught SIGTERM, shutting down

Is there a way to turn on more logging (debug logs) or a better way to trace
what it doing at that time?

Thanks,
--
Dan

http://www.moonlightrpg.com
http://www.linkedin.com/in/danbunyard
http://www.danodemano.com
http://www.dansrandomness.com
http://www.danandshelley.com

This is not a problem that requires infinite wisdom, Benj. This is a problem
that requires enough neural organization to qualify as a vertebrate,
apparently a stretch for some folks these days.
~Cecil Adams.

On Wed, Jan 27, 2010 at 08:18, Jeff Trawick wrote:

> On Tue, Jan 26, 2010 at 8:28 PM, Dan Bunyard wrote:
> > This has happened twice now and it's a little bit concerning to me. I
> have a
> > Fedora 12 server with 5GB of RAM that I use to host a few small web sites
> of
> > mine. As I mentioned, this happened once before. I tried to load one of
> my
> > web sites today and it took FOREVER (as in the 10s of minutes) to load. I
> > SSHed into the box and found the load average around 100 (dual core
> > machine). Since this was the second time it had happened, I knew that it
> was
> > Apache causing it. So I restarted the Apache service and everything
> returned
> > to normal. A look in the error_log showed this error:
> >
> > server reached MaxClients setting, consider raising the MaxClients
> setting
> >
> > I suspect that this is the reason that Apache was eating up all my system
> > resources but I don't have any idea how to fix it.
>
> This means that you have 100 active client connections, and that's the
> limit of your configuration (MaxClients=100).
>
> I didn't catch whether or not you had high CPU utilization.
>
> I didn't catch whether or not you had a high number of requests being
> processed during this time.
>
> High CPU utilization, relatively low number of requests:
>
> I'd guess that some application code running inside Apache encounters
> an unexpected situation that results in loops or other extremely high
> CPU that prevents the request from being completed within a reasonable
> period of time (or ever). The fact that you could log in after a
> while suggests that some of this faulty request processing does
> eventually finish.
>
> High CPU utilization, relatively high number of requests:
>
> Your server is just being overwhelmed -- application request
> processing requires noticable CPU, and the box can't handle large
> numbers of concurrent requests. Likely some application-level
> optimization will help.
>
>
> If you pick an httpd child process and get backtraces of it at
> intervals with gdb to see where it is spending its time, that might
> provide valuable clues.
>
> ------------------------------------------------------------ ---------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> " from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>

--001485f86e3af90429047e256014
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I have never done a backtrace, can you please point me in the right directi=
on for that?

I didn't check CPU usage at the time, only load ave=
rage which was around 100 (normally it's between 0.02 and 0.5 over 1 mi=
nute).

I was able to log in but it was VERY slow.Â As I watched the load =
average it was continuing to climb just before I killed Apache.Â It di=
d not terminate gracefully either, the error_log showed this:
[Mon Jan 2=
5 12:50:49 2010] [warn] child process 23437 still did not exit, sending a S=
IGTERM

[Mon Jan 25 12:50:49 2010] [warn] child process 23441 still did not exit, s=
ending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 23445 s=
till did not exit, sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] c=
hild process 23451 still did not exit, sending a SIGTERM

[Mon Jan 25 12:50:49 2010] [warn] child process 23453 still did not exit, s=
ending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] child process 28350 s=
till did not exit, sending a SIGTERM
[Mon Jan 25 12:50:49 2010] [warn] c=
hild process 28355 still did not exit, sending a SIGTERM

[Mon Jan 25 12:50:49 2010] [warn] child process 26939 still did not exit, s=
ending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 23437 s=
till did not exit, sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] c=
hild process 23441 still did not exit, sending a SIGTERM

[Mon Jan 25 12:50:51 2010] [warn] child process 23445 still did not exit, s=
ending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 23451 s=
till did not exit, sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] c=
hild process 23453 still did not exit, sending a SIGTERM

[Mon Jan 25 12:50:51 2010] [warn] child process 28350 still did not exit, s=
ending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] child process 28355 s=
till did not exit, sending a SIGTERM
[Mon Jan 25 12:50:51 2010] [warn] c=
hild process 26939 still did not exit, sending a SIGTERM

[Mon Jan 25 12:50:53 2010] [warn] child process 23437 still did not exit, s=
ending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 23441 s=
till did not exit, sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] c=
hild process 23445 still did not exit, sending a SIGTERM

[Mon Jan 25 12:50:53 2010] [warn] child process 23451 still did not exit, s=
ending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 23453 s=
till did not exit, sending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] c=
hild process 28350 still did not exit, sending a SIGTERM

[Mon Jan 25 12:50:53 2010] [warn] child process 28355 still did not exit, s=
ending a SIGTERM
[Mon Jan 25 12:50:53 2010] [warn] child process 26939 s=
till did not exit, sending a SIGTERM
[Mon Jan 25 12:50:55 2010] [error] =
child process 23437 still did not exit, sending a SIGKILL

[Mon Jan 25 12:50:55 2010] [error] child process 23441 still did not exit, =
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 23445=
still did not exit, sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error=
] child process 23451 still did not exit, sending a SIGKILL

[Mon Jan 25 12:50:55 2010] [error] child process 23453 still did not exit, =
sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error] child process 28350=
still did not exit, sending a SIGKILL
[Mon Jan 25 12:50:55 2010] [error=
] child process 28355 still did not exit, sending a SIGKILL

[Mon Jan 25 12:50:55 2010] [error] child process 26939 still did not exit, =
sending a SIGKILL
[Mon Jan 25 12:50:56 2010] [notice] caught SIGTERM, sh=
utting down

Is there a way to turn on more logging (debug logs) or a=
better way to trace what it doing at that time?

Thanks,Â
--
Dan

moonlightrpg.com">http://www.moonlightrpg.com
linkedin.com/in/danbunyard">http://www.linkedin.com/in/danbu nyard
href=3D"http://www.danodemano.com">http://www.danodemano.com

=

http://www.danandshelley.com a>

This is not a problem that requires infinite wisdom, Benj. This i=
s a problem that requires enough neural organization to qualify as a verteb=
rate, apparently a stretch for some folks these days.

~Cecil Adams.

On Wed, Jan 27, 2010 at 08:18, Jeff Traw=
ick <trawick@gmai=
l.com> wrote:

argin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); paddin=
g-left: 1ex;">

On Tue, Jan 26, 2010 at 8:28 PM, Dan Bunyard < =3D"mailto:danodemano@gmail.com">danodemano@gmail.com> wrote:

> This has happened twice now and it's a little bit concerning to me=
.. I have a

> Fedora 12 server with 5GB of RAM that I use to host a few small web si=
tes of

> mine. As I mentioned, this happened once before. I tried to load one o=
f my

> web sites today and it took FOREVER (as in the 10s of minutes) to load=
.. I

> SSHed into the box and found the load average around 100 (dual core >
> machine). Since this was the second time it had happened, I knew that =
it was

> Apache causing it. So I restarted the Apache service and everything re=
turned

> to normal. A look in the error_log showed this error:

>

> server reached MaxClients setting, consider raising the MaxClients set=
ting

>

> I suspect that this is the reason that Apache was eating up all my sys=
tem

> resources but I don't have any idea how to fix it.

This means that you have 100 active client connections, and that'=
s the

limit of your configuration (MaxClients=3D100).

I didn't catch whether or not you had high CPU utilization.

I didn't catch whether or not you had a high number of requests being r>
processed during this time.

High CPU utilization, relatively low number of requests:

I'd guess that some application code running inside Apache encounters r>
an unexpected situation that results in loops or other extremely high

CPU that prevents the request from being completed within a reasonable

period of time (or ever). Â The fact that you could log in after a

while suggests that some of this faulty request processing does

eventually finish.

High CPU utilization, relatively high number of requests:

Your server is just being overwhelmed -- application request

processing requires noticable CPU, and the box can't handle large

numbers of concurrent requests. Â Likely some application-level

optimization will help.

If you pick an httpd child process and get backtraces of it at

intervals with gdb to see where it is spending its time, that might

provide valuable clues.

------------------------------------------------------------ ---------

The official User-To-User support forum of the Apache HTTP Server Project.<=
br>
See <URL: lank">http://httpd.apache.org/userslist.html> for more info.

To unsubscribe, e-mail: g">users-unsubscribe@httpd.apache.org

Â " Â from the digest: scribe@httpd.apache.org">users-digest-unsubscribe@httpd.apac he.org

For additional commands, e-mail: org">users-help@httpd.apache.org

--001485f86e3af90429047e256014--